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SUMMARY 


The  Air  Foret  Occupational  Measurement  Cantor  (USAFOMC)  conducts  occupational  surveys  of  Air 
Forca  spaclaltlas.  Those  Include  the  collection  of  supervisors'  ratings  on  task  factors  such  as 
recomendad  emphasis  for  flrst-tera  training.  The  task  training  eaphasls  ratings  serve  as  Input 
to  the  Instructional  Systeai  Developaent  (ISO)  training  model,  which  guides  the  developaent  and 
revision  of  technical  training  courses.  Analysis  of  training  eaphasls  ratings  Is  usually 
perforaed  using  REXALL,  a  special-purpose  prograa  within  the  Coaprehenslve  Occupational  Data 
Analysis  Prograas  (CODAP)  systea.  Two  laportant  functions  of  REXAU  are  to  assess  the  overall 
level  of  agreeaent  aaong  raters,  and  to  calculate  an  average  (aean)  factor  rating  for  each  task. 
When  an  acceptable  level  of  Interrater  agreeaent  Is  attained,  the  task  aeans  are  rank-ordered. 
This  rank-ordering  constitutes  the  recoaaended  priority  of  training  for  each  of  the  tasks  and 
defines  the  coaaon  rating  policy  (CRP)  for  the  specialty. 

For  a  saatl  nuaber  of  specialties,  referred  to  as  'complex  specialties,*  very  poor 
Interrater  agreeaent  Is  frequently  found  that  precludes  the  extraction  of  a  reliable  training 
eaphasls  CRP.  Driven  by  the  suggestion  that  poor  Interrater  agreeaent  aay  be  caused  by  coapetlng 
rating  policies  with  possible  relevance  to  training,  a  Request  for  Personnel  Research  (RPR)  was 
Initiated  by  USAF  ONC  and  validated  through  Hq  Air  Training  Coaaand.  The  RPR  requested 
developaent  of  a  aethodology  for  Identifying  aultlpte  rating  policies  that  alght  exist  In  such 
data. 

Research  on  the  possible  causes  of  poor  interrater  agreeaent  followed  two  aaln  courses:  (a) 
Investigation  of  the  variation  In  Interrater  agreeaent  with  respect  to  the  nuaber  of  raters  used 
(saaple  size)  and  (b)  Investigation  of  the  aultlple-ratlng-pollcy  hypothesis  via  three 
Independent  analysis  techniques:  aodlfled  REXALL  analysis,  cluster  analysis,  and  factor 
analysis.  Those  techniques  were  applied  to  seven  *coap1ex  specialties*  to  see  If  aultlple  rating 
policies  could  be  Identified. 

Interrater  agreeaent  was  found  to  vary  within  and  across  different  saaple  sizes.  A  saaple 
of  approxlaately  SS  raters  Is  the  alnlaua  nuaber  recoaaended  for  extraction  of  a  reliable  CRP. 
REXALL  analyses  were  Inconclusive  with  respect  to  conflralng  the  presence  or  absence  of  aultlple 
rating  policies.  Cluster  analyses  using  existing  CODAP  software  also  proved  to  be  generally 
Inadequate  for  identifying  aultlple  rating  policies.  However,  soae  CODAP  prograas  that  report 
rater  responses  In  clustering  (KPATH)  sequence  were  found  to  be  highly  useful  for  Interpreting 
observed  REXALL  statistics. 

Results  of  principal  coaponents  factor  analyses  clearly  deaonstrated  that  the  saaples  of 
training  eaphasls  ratings  were  less  coaplex  than  expected.  A  one-factor  solution  conflnaed  that 
REXALL  analyses  which  eaploy  modified  CRP  extraction  criteria  are  appropriate  and  sufficient  for 
single-specialty  saaples  which  contain  a  dominant  CRP.  where  such  REXALL  analysis  failed, 
additional  analysis  using  a  VARIMAX  rotatlon/factor-bulldlng  aethodology  successfully  Isolated 
significantly  different  aultlple  rating  policies. 

It  Is  recoaaended  that  REXALL  analyses  with  modified  CRP  extraction  criteria  be  used  for  the 
vast  majority  of  single-ladder  specialties,  where  one  might  expect  a  single  doainant  training 
policy.  In  those  cases  when  evidence  suggests  that  multiple  policies  might  be  operative, 
principal  components  factors  analysis  with  VARIMAX  rotation  Is  recoaaended— extracting  one  and 
then  aultlpte  factors  as  appropriate.  Intepretation  of  these  results  can  be  enhanced  with  CODAP 
auxiliary  prograas  (DUVARS,  PRTDIS,  PRTVAR  and  FACPRT). 
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Preface 


This  work  resulted  froa  Request  for  Personnel  Research  (RPR)  79-1,  Analysis  of  Ratings 
by  Occupational  Task  Factors;  froa  Headquarters  Air  Training  Coaaand,  and  Mas  Initiated 
under  Hork  Unit  77340750,  CoapTex  Specialties  Task  Training  Priority  Equation 
Developaent.  It  Mas  subsequently  coapleted  under  Work  Unit  77191911,  Neasureaent  and 
Analysis  of  Job  and  Mission  Requirements.  The  present  effort  represents  a  portion  of  the 
Laboratory's  Force  Acquisition  and  Distribution  Systea  thrust. 

Dr.  Mllllaa  Alley  and  Dr.  Hendrick  Ruck  provided  helpful  suggestions  and  significant 
assistance  In  the  conduct  of  this  effort. 
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TRAINING  EMPHASIS  TASK  FACTOR  DATA:  METHODS  OF  ANALYSIS 


I.  BACKGROUND 

The  A4r  Force  Occupational  Measurement  Center  (USAFOMC)  conducts  task-based  occupational 
surveys  of  Air  Force  specialties.  These  surveys  Include  the  collection  of  supervisors'  ratings 
on  task  factors  such  as  recommended  training  emphasis.  Recommended  training  emphasis  Is  defined 
as  the  emphasis  that  should  be  given  In  structured  training  of  the  task  for  entry-level  airmen, 
regardless  of  where  that  training  takes  place  {t.e.,  resident  course.  Field  Training  Detachment, 
or  on-the-job  training).  First-term  training  priorities  are  Input  to  the  Instructional  System 
Development  (ISO)  training  model,  which  guides  the  development  and  revision  of  specialty  training 
courses.  The  utility,  reliability,  and  validity  of  training  emphasis  ratings  In  terms  of  ISD 
theory  have  been  demonstrated  by  Ruck,  Thompson,  Brown,  and  Stacy  (In  preparation). 

For  approximately  20*  of  specialties,  training  emphasis  ratings  have  been  quite  difficult  to 
Interpret,  due  to  poor  Interrater  agreement.  The  suggestion  has  been  that  the  data  for  such  a 
'complex  specialty*  may  contain  conflicting  rating  policies  aligned  with  the  various  employment 
duties/areas  within  a  specialty.  Currently,  there  are  no  satisfactory  operational  techniques  for 
Identifying  such  multiple  policies.  Research  to  develop  a  methodology  for  Identifying  the 
various  rating  perceptions  that  may  exist  In  training  emphasis  ratings  was  Initiated  as  a  result 
of  a  Request  for  Personnel  Research  (RPR  79-1),  Analysis  of  Ratings  by  Occupational  Task  Factors, 
submitted  by  Headquarters  Air  Training  Command. 

Analysis  of  training  emphasis  rating  data  Is  usually  performed  using  REXALL,  a 
special-purpose  program  developed  and  documented  by  Christa!  and  Weissmuller  (1976)  within  the 
Comprehensive  Occupational  Data  Analysis  Programs  (CODAP)  system.  The  three  main  functions  of 
REXALL  are  (a)  to  assess  the  level  of  Interrater  agreement,  (b)  to  Identify  divergent  raters,  and 
(c)  to  calculate  the  mean  factor  rating  for  each  task.  With  respect  to  overall  Interrater 
agreement,  REXALL  Is  designed  to  cope  with  a  sample  of  raters  who  are  anticipated  to  be 
relatively  homogeneous  In  terms  of  their  rating  ability. 

Ratings  for  first-term  training  emphasis  are  made  using  a  9-polnt  scale:  from  1  (extremely 
low)  to  9  (extremely  high).  However,  the  Instruction  to  "rate  only  tasks  which  you  believe 
require  training  for  first-termers*  recognizes  the  validity  of  a  zero  rating.  By  default,  all 
non-ratings  are  Interpreted  to  mean  *no  training  recommended*  and  are  Included  as  zeros  In  all 
REXALL  calculations.  Including  the  mean  training  emphasis  for  each  task. 

As  a  measure  of  Interrater  agreement,  REXALL  computes  two  Indices  of  Interrater  reliability 
using  the  Intraclass  correlation  formulas  reported  by  Lindquist  (1953).  The  two  Indices  are 
Rjj,  single-rater  reliability,  which  approximates  the  average  of  all  possible  pair-wise  rater 
correlations;  and  R^,  reliability  for  a  sample  of  k  raters,  which  Is  the  expected  correlation 
between  the  set  of  observed  sample  task  means  and  the  task  means  of  an  hypothetical  equivalent 
sample.  Du's  and  Rkk's  meeting  or  exceeding  minimum  criterion  values  are  Interpreted  as 
meaning  that  sufficient  Interrater  agreement  exists  to  produce  stable  estimates  of  task  mean 
values. 


The  standard  REXALL  analysis  procedure  for  achieving  acceptable  Interrater  agreement  and  a 
set  of  reliable  task  mean  ratings  Is  to  Identify  and  delete  divergent  raters,  as  discussed  by 
Goody  (1976).  Divergent  raters  are  those  whose  ratings  differ  significantly  from  the  ratings  of 
the  majority  of  raters  because  of  failure  to  follow  Instructions,  Inverted  or  poor  discriminative 
use  of  the  rating  scale,  unique  perception  of  tasks,  or  lack  of  knowledge.  These  divergent  rater 
characteristics  are  reflected  by  a  low  or  negative  correlation  between  the  Individual  rater's  set 
of  ratings  and  the  sample  task  means  (excluding  the  subject  rater's  ratings),  and/or  a  low 
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t-value  (confidence  level  associated  with  the  correlation  being  different  fro*  zero).  A  typical 
Tatar  saaiple  Is  assusied  to  have  a  simple  structure  consisting  of  a  Majority  of  good  raters  who 
yield  a  set  of  stable  task  naans  and  a  minority  of  divergent  raters  who  Individually  disagree 
with  the  Majority  rating  pattern.  For  determining  training  emphasis,  the  rank-ordered  task  means 
computed  from  the  ratings  of  the  residual  good  raters  constitute  the  recommended  training 
priority  and  define  the  common  rating  policy  (CAP). 

The  REXALL  program  provides  no  Information  as  to  why,  for  some  specialties,  Rjj  remains  low 
even  after  successive  deletions  of  divergent  raters.  The  rationale  underlying  the  present  effort 
Is  that  for  such  specialties,  a  low  Rjj  may  be  a  function  of  conflicting  multiple  rating 
policies,  each  associated  with  a  subgroup  of  raters  sharing  similar  training  perceptions  aligned 
with  a  specific  employment  area  within  the  specialty.  If  this  Is  the  case,  then  the  mean 
ratings,  across  a  total  specialty  sample,  may  not  reflect  any  meaningful  policy,  and  significant 
policy  differences  may  be  obscured  by  the  averaging  process. 

The  present  study  was  aimed  at  developing  a  technique  to  Identify  and  describe  such  different 
policies  which,  when  present,  may  account  for  the  low  Interrater  reliabilities  obtained  for  some 
specialties.  In  designing  the  approach,  It  was  recognized  that  other  factors  may  also  contribute 
to  low  Interrater  agreement.  Five  factors,  In  all,  were  regarded  as  possible  sources  of  error: 
(a)  random  sampling  variance,  (b)  multi-ladder  task  lists,  (c)  random  variation  In  rater 
responses,  (d)  presence  of  divergent  raters,  and  (e)  multiple  rating  policies.  The  first  of 
these,  random  sampling  variance,  was  Investigated  by  observing  the  effects  on  Rjj  of  repeated 
samplings  Involving  different  numbers  of  raters.  The  remaining  factors  were  Investigated 
employing  modified  REXALL  analysis,  CODAP  cluster  analysis,  and  factor  analysis.  These 
techniques  are  described  under  "Findings.*  The  paragraphs  that  follow  discuss  five  possible 
causes  of  low  Rjj. 

1.  Random  sampling  variance,  a  function  of  sample  size,  was  considered  to  be  a  potentially 
significant  cause  of  low  Interrater  agreement.  The  average  operational  training  emphasis  sample 
size  is  45  supervisory  raters,  with  a  range  of  10  to  80  raters.  The  sample  size  Is  primarily  a 
function  of  supervisory  rater  availability.  Statistically,  there  Is  a  greater  chance  of 
obtaining  an  unrepresentative  sample  with  abnormally  low  (or  high)  Interrater  agreement  for  the 
smaller  samples.  The  relationship  between  sample  size  and  the  interrater  reliability  Indices, 
R 1 1  and  R^,  Is  algebraically  summarized  by  the  Spearman-Brown  prophecy  formula.  In  general 
terms.  It  states  that  R^  Increases  as  Rj]  and  sample  size  Increase.  The  criterion  minimum 
for  acceptable  single  rater  reliability,  Rjj  ■  .20,  Is  obtained  from  this  formula  by  the 
Insertion  of  R^  •  .90  as  a  widely  recognized  criterion  minimum  for  stable  task  means,  and  a 
sample  size  of  approximately  40  raters  which  Is  regarded  as  sufficiently  large  to  be  stable. 
Estimation  of  this  minimum  sample  size  assumes  the  level  of  Interrater  agreement  and  basis  for 
agreement  (rating  policy)  within  the  sample  reflects  that  of  the  parent  population.  To  address 
the  Issue  of  the  stability  of  R ] j  as  a  function  of  sample  size,  two  large,  single-specialty 
rater  samples  were  taken  as  Independent  finite  populations,  and  100  subsamples  for  each  of  12 
sample-size  points  In  the  10-  to  100-rater  range  were  randomly  selected  and  assessed  for  level  of 
single-rater  reliability  ( R | | ) .  The  results  are  provided  In  the  'Findings*  section  of  this 
report. 


2.  Where  more  than  one  specialty  Is  surveyed  with  a  single  comprehensive  survey  Instrument 
(l.e.,  for  multi-ladder  task  lists),  a  low  R^  may  be  attributable  to  conflicting 
specialty-aligned  Interests  with  little  or  no  common  training  recommended.  REXALL  analysis  would 
obviously  be  Inappropriate  under  this  condition.  Analysis  results  of  a  dual-specialty  sample, 
both  In  combined  form  and  as  two  single  specialties,  are  Included  In  the  Investigation  of 
multiple  rating  policies. 
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3.  Randoa  variation  In  ratar  ratponsat  aay  occur  where  most  ratars  dlsagraa  dua  to  thalr 
highly  Individual  Interpretations  of  tha  talk  Hat  and/or  rating  tcala.  This  rapresants  tha 
axtraaa  aultlple-ratlng-pollcy  condition.  Although  tha  raaaarch  approach  takan  hare  uses  cluster 
and  factor  analyses  as  prlaary  aathods,  an  understanding  of  how  Interrater  agreeaent  Is  assessed, 
and  hou  rating  policies  are  exaalned  using  existing  techniques  Is  In  order.  Being  the  prlaary 
ratings  analysis  tool  readily  available  In  CODAP,  REXALL  Is  normally  used  for  analyses  of  all 
ratings. 

4.  The  presence  of  divergent  raters  aay  serve  to  depress  Interrater  agreeaent.  Existing 
REXALL  procedures  for  extracting  a  reliable  CRP  Involve  the  Initial  deletion  of  the  divergent 
raters  (pass  1)  and.  If  necessary,  deletion  of  any  newly  Identified  divergent  raters  (pass  2). 
Divergent  raters  are  ellalnated  froa  the  saaple  to  achieve  stable  estlaates  of  task  aeans. 
Consistently  observed  Increases  In  Rjj  and  R^  resulting  froa  the  deletion  of  divergent 
raters  In  operational  saaples  support  this  procedure  and  contribute  to  the  face  validity  of  the 
following  USAFOMC  CRP  extraction  criteria  for  training  eaphasls:  (a)  alnlaua  acceptable  level  of 
Interrater  agreeaent,  R]j  *  .20,  R^  ■  .90s  (b)  alnlaua  acceptable  rater  correlation  with 
aean,  r  •  .30  and/or  t-value  •  3.0;  (c)  deletion  boundaries  -  aaxlaua  of  two  deletion  passes, 
aaxtaua  of  10*  raters  deleted;  and  (d)  alnlaua  nuaber  of  good  raters,  40.  Coaplex  specialties 
are  defined  as  those  whose  training  eaphasls  ratings  fall  to  provide  a  reliable  CRP  via 
application  of  these  procedures  and  criteria.  However,  the  presence  of  an  Inordinate  nuaber  of 
divergent  raters  aay  disguise  an  underlying  CRP  to  an  extent  which  renders  existing  CRP 
extraction  criteria  unsuitable.  If,  on  the  other  hand,  excessive  rater  divergence  Is  viewed  not 
as  a  distinction  between  good  and  poor  raters,  but  as  an  Indicator  of  aultlpla  rating  policies, 
then  the  fifth  factor  coaes  Into  play.  This  factor  assuaes  the  adequacy  of  the  listed  CRP 
extraction  criteria  for  saall  or  aoderate  divergence  and  assuaes  coaplexlty  to  be  attributable  to 
coapetlng  rating  policies  when  Interrater  agreeaent  and  divergence  criteria  are  not  aet.  It  Is 
Important  to  note  that  the  aultlple  reting  policy  condition  does  not  preclude  the  possibility  of 
a  CRP  which  Is  not  readily  discernible  via  standard  REXALL  analysis  nor  the  existence  of 
divergent  raters. 

5.  Multiple  rating  policies  can  be  defined  In  teras  of  differences  In  the  rank-ordering  of 
tasks  between  various  paired  subgroups  of  raters.  A  Spearaan  rank-order  correlation  with  an 
r,  <  .50  was  taken  as  Indicating  a  practical  difference  In  the  recoaaended  training  priority 
between  any  two  rating  policy  groups.  These  differences  aay  be  attributed  *o  any  coablnatlon  of 
differences  In  number,  type,  and  level  of  tasks  recoaaended.  The  greatest  possible  difference 
between  any  two  policies  Is  that  they  recommend  totally  different  sets  of  tasks  for  training. 
Relatively  small  policy  differences  would  result  from  minor  variation  In  the  level  of 
recommendations  on  the  same  set  of  tasks.  In  relation  to  meaningful  alternative  training 
policies,  It  would  be  highly  desirable  for  raters  within  significantly  different  rating  policy 
groups  to  share  a  common  background  characteristic  such  as  Job  title  or  major  command  (MAJCON), 
which  could  be  viewed  as  explanatory  factors  contributing  to  policy  differences. 

The  postulated  single-specialty  rating  policy  doaaln  Is  summarized  In  Figure  1.  The  simple 
or  complex  specialty  classification  corresponds  to  achievement  or  nonachievement  of  a  reliable 
CRP  employing  the  previously  described  standard  REXALL  analysis  procedure  and  criteria.  The 
multi-ladder  saaple  type  Is  not  Included  In  Figure  1  since  this  type  Is  obviously  predisposed  to 
being  coaplex  and  Is,  therefore,  unsuitable  for  REXALL  analysis. 
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SINGLE- SPECIALTY  SAMPLE 


Achievement  of 
Reliable  CRP 


I 


SIMPLE  SPECIALTY 

CRP  (Includes  all  raters) 

CRP  divergency  10X  competing 


Nonachievement 
of  Reliable  CRP 


COMPLEX  SPECIALTY 

CRP  divergency  10X 
Two  or  more 
policies  (no  CRP) 

No  main  policies 


Figure  1.  Single-specialty  rating  policy  dosaln. 


In  the  current  Investigation  various  analytical  techniques  were  tested  with  training  esphasls 
data  froa  six  specialties.  Details  for  the  six  training  esphasls  data  sets  analyzed  In  this 
study  are  susaarlzed  In  Table  1.  The  first  two  data  sets  were  obtained  fron  USAFOMC  as  exaeples 
of  coaplex  specialties  with  very  poor  Interrater  agreeaent.  The  third  USAFOMC  data  set,  a 
two-career- ladder  study,  was  analyzed  both  In  the  coablned  fora  and  as  two  single-specialty 
saaples.  The  reaalnlng  two  data  sets  were  for  specialties  deeaed  coaplex  as  a  consequence  of  the 
AFHRL  training  esphasls  equation  study  (Ruck  et  al..  In  preparation).  Application  of  standard 
criteria  for  deletion  of  divergent  raters  produces  levels  of  Interrater  agreeaent  as  per  Table 
Z.  All  saaples  fall  to  qualify  as  a  slaple  specialty  under  strict  application  of  the  101  aaxlaua 
deletion  criterion.  However,  the  relatively  high  levels  of  Interrater  agreeaent  for  AFSCs  328X0, 
328X1,  and  67ZX2  suggest  the  specialties  to  be  slaple  rather  than  coaplex.  Attalnaent  of  alnlaua 
Interrater  agreeaent  with  a  relatively  high  deletion  percentage  for  AFSCs  811X0  and  304X0  render 
thea  possible  coaplex  specialties.  The  saall  AFSC  404X0  saaple  and  the  dual-specialty  AFSC  328XX 
saaple  are  coaplex. 


II.  FINDINGS 

The  findings  presented  pertain  to  the  Investigations  of  saapllng  error  and  multiple  rating 
policies  as  possible  causes  of  poor  interrater  agreeaent. 


Saapllng  Variations 


Two  specialties,  304X4  and  672X2,  were  selected  as  probable  coaplex  specialties  and  rating 
data  were  collected  from  especially  large  saaples  of  raters  to  permit  analysis  of  saaple  size 
effects.  Table  3  details  the  variation  in  Rjj  at  three  saaple  sizes  (10,  50,  and  100  raters) 
for  the  two  specialties.  In  each  case,  the  average  R^  (X)  and  variation  In  Ru  (SD)  are  for 
100  randoa  subsaaples.  The  observed  range  In  Rjj  Is  described  by  the  MIN  and  MAX  values  which 
Illustrate  the  extent  to  which  observed  Interrater  agreeaent  differed  froa  that  of  the  parent 


population  for  a  typical  operational  sample  of  10  to  100  raters.  The  relationship  between  the 
stability  of  Rjj  (SD  of  Rjj)  and  sample  size  Is  graphically  sumarlzed  by  the  curves  through 
the  data  points  In  Figure  2.  Both  Table  3  and  Figure  2  demonstrate  that,  for  corresponding 
sample  sizes,  the  variation  In  Rjj  for  the  AFSC  672X2  raters  Is  greater  than  that  for  the  AFSC 
304X4  raters  with  stabilization  of  Rjj  (SD  ■  .02)  occurring  at  n  ■  100  and  n  ■  SO, 
respectively.  With  respect  to  establishing  a  suitable  sample  size  for  REXALL  analysis,  both 
specialties  are  sufficiently  stable  at  the  SO-  to  60-rater  size  to  permit  extraction  of  the  CRP 
(If  present).  For  sample  sizes  much  below  SO  raters,  the  problem  of  sampling  error,  as  a  cause 
of  poor  Interrater  agreement,  Is  morn  significant. 


Table  1.  Training  Emphasis  Data  Samples  Analyzed 
with  All  Raters  Included 


AFSC 

Title 

Source 

Raters 

Number 

Dlvergents 

"11 

*kk 

404X0 

Precision  Imagery  and  Audio- 
Visual  Media  Maintenance 

USAFOMC 

47 

12 

.09 

.73 

811X0 

Security  Specialist 

USAFOMC 

120 

23 

.15 

.95 

326XX 

Avionics  Communications/ 
Navigation  Systems 

USAFOMC 

148 

34 

.12 

.95 

328X0 

Avionic  Communications 

Systems 

USAFOMC 

65 

11 

.41 

.98 

328X1 

Avionic  Navigation  Systems 

USAFOMC 

83 

7 

.27 

.97 

672X2 

Disbursement  Accounting 

AFHRL 

149 

20 

.26 

.98 

304X0 

Ground  Radio  Communications 
Equipment 

AFHRL 

335 

48 

.17 

.98 

Note.  R|]  and  R^  values  are  for  the  total  sample  (Number  Raters),  which 
Includes  the  number  of  dlvergents  (r  <  .30)  shown. 


811X0 


32BXX 

326X0 

328X1 

672X2 

304X4 


Hoi 

tdentlf! 


10  20  30  40  50  60  70  80  90  100 

SAMPLE  SIZE 


Figure  2.  Stability  of  tfj  versa*  sanple  six*. 
Detecting  Multiple  >«t1i|  Policies 


Modified  REXA11  Analysis 

61ven  that  REXALL  is  specifically  designed  to  evaluate  rater  perforaance  with  respect  to  a 
single  rating  policy,  deploying  It  as  a  tool  to  assist  with  the  Identification  of  Multiple  rating 
policies  within  a  single  data  set  requires  that  rater  subgroups  representing  potential  rating 
policies  be  soaehow  preselected.  Modified  REXALL  analysis  Involved  two  different  Methods  for 
predefining  potential  rating  policy  groups. 

First,  the  possibility  that  a  coaplex  rating  data  set  Might  be  coaprlsed  of  one  doalnant 
policy  and  a  snaller  Minor  policy  was  Investigated  by  Iteratively  applying  REXALL:  1.a.»  by 
resmvlng  the  raters  having  a  relatively  high  correlation  with  the  sanple  Mean  vector  froe  the 
original  set  of  raters  and  running  REXALL  on  the  two  resulting  sets  of  raters  until  stable 
policies  and  assorted  divergent  raters  have  been  Identified.  This  approach  assuees  that  the 
sanple  nean  vector  Is  driven  by  the  doulnant  policy  raters  and  requires  an  arbitrary  criterion 
correlation  point  to  establish  potential  rating  policy  group  uenbershlp.  Tables  4  and  S  contain 
the  distribution  and  percant  occurrence  of  rater  correlations  produced  by  the  respective  sauple 
Mean  vectors.  A  criterion  correlation  point  of  .30  to  divide  raters  led  to  doelnant  policy 


result*  as  produced  by  the  existing  procedure  for  extracting  the  coMon  rating  policy  (see  Table 
2).  REXALL  analysis  of  the  potential  minor  policy  groups  resulted  In  very  poor  Interrater 
agreesient  for  all  samples.  Adjustment  of  the  criterion  correlation  point  to  .40  produced  very 
stable  dosilnant  policies  for  all  specialties  except  AFSCs  404X0  and  328XX.  All  potential  minor 
policy  groups  displayed  very  poor  Interrater  agreement.  Considering  the  arbitrary  nature  of  the 
criterion  correlation  point,  and  the  questionable  assumption  that  similar  rater  correlations 
equate  to  similar  rating  patterns,  the  results  for  all  samples  mere  Inconclusive  with  respect  to 
confirming  the  presence  or  absence  of  the  domlnant/mlnor  policy  condition.  In  general,  this 
method  was  a  poor  one  for  dealing  with  complex  specialties. 


Table  4.  Frequency  of  Occurrence  of  Rater  Correlations 
~~  (Pearson  Product-Moment  Correlations) 


Number  of  Raters  Correlating  with 

the  Mean 

(Interval) 

No  of 

1.0-  .89- 

.79- 

.69- 

.59- 

.49- 

.39- 

.29- 

AFSC 

Raters 

R11 

Rkk 

.90  .80 

.70 

.60 

.50 

.40 

.30 

.20 

404X0 

47 

.09 

.73 

5 

10 

9 

11 

12 

811X0 

120 

.15 

.95 

3 

20 

30 

29 

15 

23 

328XX 

148 

.12 

.95 

2 

22 

4 

46 

34 

328X0 

65 

.41 

.98 

2  21 

19 

7 

4 

0 

1 

11 

328X1 

83 

.27 

.47 

4 

20 

2? 

9 

15 

7 

7 

672X2 

149 

.26 

.98 

30 

32 

19 

18 

19 

11 

20 

304X4 

335 

.17 

.98 

15 

58 

93 

78 

35 

48 

Note:  The  ranges  are  for  Pearson  Product-Moment  Correlation  Coefficients  (r) 
between  Individual  raters  and  the  mean  rating.  For  example,  for  AFSC  404X0  there  are 
five  raters  who  correlate  less  than  .7  but  greater  than  or  equal  to  .6  with  the  total 
sample  task  mean  vector. 


A  second  modified  REXALL  analysis  method  Involved  the  analyses  of  potential  rating  policy 
groups  comprised  of  raters  with  common  background  variables  such  as  duty  title,  major  command, 
and  specialty  code.  Previously  recorded  high  levels  of  Interrater  agreement  for  the  two  separate 
specialties,  AFSC  328X0  and  AFSC  328X1,  drawn  from  the  AFSC  328XX  dual-ladder  sample,  constitute 
the  only  interpretable  success  for  this  method.  The  Inconsistency  of  results  for  all  other 
samples  rendered  this  approach  unsuitable. 
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T«bl«  5.  hrcMtiit  of  Occurrence  of  Rotor  Corrolotlooo 
(Peorson  Product-Romnt  Corrolotlooo) 


Roto:  Porcontago  distribution  of  oil  REXALL  rotor  corrolotlooo  with 
roopoct  to  three  cotogorlot:  good  rotors  {r_  >.*) i  doubtful  rotors 
( .4 > r  >.3)t  and  dlvorgont  rotors  (r<.J) 


Clustor  Analysis 

Tho  CODAP  clustering  program  woro  applied  to  tbe  sooples  In  an  attempt  to  develop  new 
procedures  and  guidelines  for  using  end  Interpreting  existing  clustering  software  with  task 
foctor  data.  Appendix  A  provides  o  description  of  the  clustering  program,  the  slallerlty 
■easure  (percent  training  ewphasls  In  common),  and  auxiliary  COOAP  program  used  to  Interpret  the 
clusterings.  For  all  sawples,  the  percent-tra1n1ng-eaphas1s*over1ap  algorithm  aggregated  the 
raters,  who  were  very  homogeneous  with  respect  to  the  nuaber  and  type  (by  duty)  of  tasks  rated. 
REXALL  analysis  of  these  awln  rater  groups  produced  significantly  higher  values  of  R]|  and 
higher  Individual  rater  correlations  with  their  respective  group  task  man  vectors  than  were 
observed  with  the  parent  staple.  This  Indicated  that  those  rotors  who  hove  high  overlap  with  one 
another  on  the  ratlnga  of  tasks  they  choose  to  recoaaend  for  training  display  a  high  level  of 
overall  Interrater  agreement.  Merging  of  these  groups  resulted  In  rater  clusters  with  reduced 
levels  of  Inter rater  agreeaent. 

Group  rating  policies  differed  to  varying  degrees  In  their  rank«order1ng  of  tasks.  Within 
each  saaple,  the  strongest  differences  (r^  <  .SO)  occurred  between  groups  rating  virtually  all 
or  mny  tasks  across  all  duties  and  those  rating  few  tasks  across  duties  or  rating  tasks  confined 
to  very  few  duties.  These  rating  policy  groups  were  minor  In  nuaber  and  site  and  represent 
raters  with  extrem  training  recoaaendatlons.  Less  prominent  policy  differences  (£j  ^  .10) 
occurred  between  groups  rating  closer  to  the  saaple  average  nuaber  of  tasks  rated.  Raters  In 
these  groups  constituted  the  bulk  of  each  saaple  and  tended  to  emphasise  much  the  sam  technical 
duties  which  contained  a  large,  common  core  of  htgh-tralnlng-prlorlty  tasks. 


The  dual-specialty  AFSC  328XX  sample  and  the  snail  AFSC  404X0  sample  clusterings  exhibited 
Individual  differences  not  observed  In  the  other  clusterings.  For  the  AFSC  328XX  sanple,  89*  of 
raters  clustered  Into  two  single-specialty  groups:  AFSC  328X0  or  AFSC  328X1.  Within  each 
single-specialty  group,  rating  policy  correlations  are  highly  positive  (r,  >  .50).  Across 
specialty  groups,  rating  policy  correlations  are  negative.  The  AFSC  404X0  clustering  produced 
three  snail  rater  groups  which  account  for  only  63*  of  the  sanple.  All  three  group  rating 
policies  denonstrate  significant  differences  highlighted  by  very  low  between-ratlng-pollcy 
rank-order  correlations  (r,  <  .50).  Ungrouped  raters  (27*)  were  regarded  as  heterogeneous. 
Isolate  raters. 

A  valuable  feature  of  the  CODAP  systen  Is  the  capability  to  process  rater  background 
Information.  The  CODAP  DUVARS,  PRTDIS,  and  PRTVAR  rater  data  sunnarles  In  clustering  (KPATH) 
sequence  were  found  to  be  useful  aids  for  Interpreting  observed  REXALL  Interrater  reliability 
statistics  and  rater  correlations.  The  PRTVAR  program  can  be  utilized  to  summarize 
biographies  In  the  KPATH  clustering  sequence  to  determine  the  extent  of  shared  background 
characteristics  wlthl-  rate"  groups.  For  all  single-specialty  samples,  rater  characteristics, 
such  as  grade,  major  command,  primary  and  duty  specialty,  and  job  title/work  station  (available 
only  for  AFSCs  672X2  and  304X4),  could  not  be  discerned  to  have  any  obvious  connection  with 
cluster  groups.  Application  of  discriminant  analysis  to  establish  the  extent  to  which  background 
variables  predict  cluster  group  membership  failed  to  detect  any  meaningful  associations.  In  the 
case  of  the  dual-specialty  (AFSC  328XX),  raters  clearly  clustered  into  primary  duty  rating  policy 
groups:  l.e,,  either  AFSC  328X0  or  AFSC  328X1. 

In  summary,  the  CODAP  clustering  of  training  emphasis  ratings  produced  cluster  structures 
comprised  of  a  number  of  rater  groups  with  rating  policy  differences  which  were  mainly  a  function 
of  variation  In  the  number  and  type  of  tasks  and  duties  raters  chose  tc  recommend  for  training. 
However,  four  limitations  are  seen  as  major  obstacles  to  accepting  the  training  emphasis  cluster 
structures  as  a  generally  suitable  method  for  Identifying  multiple  racing  policies.  First,  the 
adjustment  of  ratings  to  a  percentage  of  a  rater's  total  ratine  sum  results  In  the  loss  of 
Important  Information  about  the  level  (magnitude)  of  assigned  ratings.  Second,  the  overall 
clustering  Is  strongly  driven  by  overlap  over  all  non-zero-rated  tasks,  which  detracts  from 
common  duty  emphasis.  Third,  subjective  decisions  are  required  to  determine  the  cluster  group 
boundaries.  Last,  the  status  of  the  considerable  number  of  Isolate  raters  (5*  to  20S)  1$  a> 
unknown.  Because  of  these  limitations,  the  clustering  of  training  emphasis  ratings  Is  regarded 
as  generating  a  rater  sequence  Incorporating  rater  subsets  which  are  useful  only  as  a  meanlngfu 
summary  of  rater  characteristics  and  not  representative  of  multiple  rating  policies. 

Since  a  CODAP  approach,  if  successful,  would  offer  many  operating  conveniences,  five 
additional  approaches  were  tested  for  making  use  of  the  clustering  programs.  These  techniques, 
which  were  based  on  assumptions  not  reported  here.  Involved  different  treatments  of  the  raw  data 
prior  to  Input  to  the  COOAP  clustering  programs.  The  five  data  treatments  were  as  fn’iny*:  (a) 
direct  Input  of  the  raw  ratings  to  the  OVRLA®  program,  bypassing  the  usual  IHP'To  parentage 
conversion  described  in  Appendix  A;  (b)  conversl'"  of  all  non-zero  ••etlnos  to  value  o'  1.  with 
all  zeros  left  zero;  fc)  conversion  of  all  no«-t*'<-  rating*  to  values  •>*  j.  with  »*-<■•  ratings 
Ignored  In  the  clustering  programs;  (d)  conversion  of  all  ratings  by  adding  1,  produ-'»»v  a  1  to 
10  rating  scale,  with  no  zeros  In  the  analysis;  and  (e)  a  conversion  designed  to  give  higher 
weight  to  the  higher  raw  ratings.  In  this  last  conversion  *11  original  non-zero  ratings  wits  « 
value  of  X  were  transformed  to  2**1,  and  ill  zeros  Ignored  In  the  clustering.  In  every 
these  similarity  measures  generated  much  the  same  clustering  group  structure  as  the  percei: 

training  emphasis  clustering.  The  CODAP  clustering  approach  was  consequently  discarded  as  a 

suitable  analysis  technique  for  Identifying  multiple  rating  policies. 

Factor  Analysis 

A  Q-type  principal  components  factor  analysis  (MAX-FACTOR  program)  with  a  rater  by  rater 

correlation  matrix  Input  (TRICOR  program  using  ratings  on  a  0-9  scale)  was  applied  to  each 
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training  emphasis  staple.  With  this  approach,  retars  were  treated  as  variables  loading  on 
factors  (dlaenslons  of  coaaon  variance)  which  ware  Interpreted  as  potential  rating  policies.  The 
custoaary  criterion  factor  loading  of  .33  (approxlaately  113  of  a  rater's  variance  accounted 
for)  was  taken  as  the  Minima  absolute  value  for  Meaningful  rater  contribution  to  a  factor  rating 
policy.  Each  factor  rating  policy  was  defined  by  exanlnlng  the  pattern  of  rater  loadings  In 
relation  to  considerations  such  as  rater  background  characteristics,  percent  training  eaphasls 
per  duty  allocation  or  the  rank-ordered  task  Means  for  a  factor  rating  policy  group.  The 
relative  strength  of  rating  policies  was  deternlned  by  cowparlng  their  respective  commoo 
variances  as  proportions  of  total  variance  accounted  for  (XN) . 

In  contrast  to  cluster  analysis,  where  rating  policies  are  characteristic  of  rater  groups 
with  Mutually  exclusive  Meebershlp,  factor  analysis  generates  rating  policies  that  are  external 
to  the  rater  set  by  deternlnlng  each  rater's  loading  on  each  rating  policy  extracted.  This 
pernlts  evaluation  of  rater  perfornance  across  all  policies.  A  further  feature  of  this  approach 
Is  the  capability  to  control  the  nunber  of  rating  policies  for  analysis.  Initially,  the  extent 
to  which  a  single  general  factor  commoh  rating  policy  prevails  was  Investigated.  By  eMploylng  a 
VAR I MAX  rotation/factor  building  Methodology,  the  relative  utility  of  factor  solutions  consisting 
of  Iteratively  Increasing  nunbers  of  rating  policies  was  evaluated  In  order  to  establish  the. 
Multiple  rating  policy  structure  which  bast  characterizes  the  saMple  and  also  to  establish  the 
relationship  between  that  structure  and  the  CAP. 

General  factor  solution.  The  general  factor  extracted  In  a  one-factor  solution  accounts  for 
the  greatest  anount  of  shared  variance  within  the  data  and  Is  conceptualized  as  the  CAP 
underlying  the  total  rater  set.  Analysis  of  the  pattern  of  rater  loadings  on  this  factor 
establishes  the  extent  to  which  the  CAP  exists  within  the  sawple.  All  single-specialty  sanples 
were  found  to  have  a  factor  CAP  characterized  by  all  significant  loadings  being  unidirectional 
and  by  an  acceptable  level  of  rater  agreeMent.  Except  for  AFSC  404X0,  the  coMMon  rating  policy 
accounted  for  the  Majority  of  raters.  In  contrast,  the  dual-specialty  AFSC  326XX  general  factor 
was  cowprlsed  of  bipolar  significant  loadings  Indicative  of  two  strong  specialty-specific  rating 
policies  and  preclusive  of  a  CAP  as  the  dOMlnant  policy  for  the  total  sawple.  Statistics  and 
details  for  this  factor  CAP  for  the  single-specialty  sawples  are  presented  In  Table  6. 

Table  4.  Analysis  Results  for  the  General  Factor  (CAP) 
for  Each  Specialty 


AFSC 

NuMber* 

Raters  Dlvergents 

S  Total 
Variance 

R 1 1 

Rfck 

404X0 

22 

2S  (53S)b 

17.6 

eCA 

•  00 

SI  1X0 

93 

27  (23S) 

23.0 

• 

•  tpO 

320X0 

$4 

11  ( 170 

62.1 

•  T" 

320X1 

74 

9  (IIS) 

37.0 

•  J* 

e»f 

672X2 

12S 

24  ( 16S) 

40-0 

•  90 

er" 

304X4 

276 

S9  (10S) 

20.0 

ell 

•  ™“ 

^Nunber  of 

Raters  eguates 

loadings 

greater 

than  criterion  nlnlMun  of  .33  (US  of  variance). 

^Parentheses  contain  nuwber  of  dlvergents  as  percentage  of 
total  saap!e. 
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A  detailed  analysis  of  the  high- low  ratar  loading  aequanca  on  tha  single-specialty  ganaral 
factora  confirmed  tha  notion  that  this  factor  raprasants  tha  doailnant  theme  which  links  tha 
majority  of  ratara  within  tha  slngla-apaclalty  samples.  Itaratlva  removal  of  ratara  froai  tha  low 
loading  and  of  tha  rank-ordarad  ganaral  factor  loading  saquanca  rasultad  In  a  staady  Incraasa  In 
R 1  g  and  despite  decreasing  sawple  size.  This  continual  Improvement  of  Intarratar 

reliability  Is  a  function  of  tha  systematic  reduction  of  error  variance  and  establishes  tha 
ganaral  factor  loading  saquanca  as  an  accurate  distribution  of  ratar  performance  with  respect  to 
the  CRP. 

Comparison  of  the  REXALL  high-low  rater  correlation  sequence  (as  produced  by  tha  sample  task 
mean  vector)  with  the  corresponding  general  factor  high- low  ratar  loading  sequence  for  each 
single  specialty  revealed  a  close  matching  In  rater  rank-orders  and  correlatlon/loadlng  values 
which  tended  to  virtual  equivalence  with  Increasing  total  sample  Rjj.  Corresponding  factor  CRP 
and  REXALL  analysis  results  are  presented  In  Table  7.  Except  for  AFSC  404X0,  the  CRP  extraction 
criteria  for  both  analysis  procedures  Identified  similar  or  Identical  divergent  rater  sets. 
Minor  differences  are  due  to  the  retention  of  a  few  REXALL  doubtful  raters  (.30<r<.40)  the 
Inclusion  (or  exclusion)  of  whom  can  be  demonstrated  to  generate  negligible  perturbations  In  the 
rating  policy  task  mean  rank-order.  For  these  five  single-specialty  samples,  the  REXALL  grand 
task  mean  vector  performed  adequately  as  a  standard  for  determining  the  relative  worth  of  all 
•  raters  with  respect  to  the  CRP.  Large  discrepancies  between  the  factor  and  REXALL  analyses 
statistics  for  AFSC  404X0  were  caused  by  the  relatively  large  number  of  divergent  raters  (53S) 
who  did  not  Identify  significantly  with  the  specialty  CRP.  Consequently,  the  sample  task  mean 
vector  produced  a  REXALL  rater  correlation  sequence  which  did  not  reflect  the  relative  worth  of 
raters  with  respect  to  the  CRP.  For  this  type  of  complex  sample,  routine  REXALL  analysis 
procedures  are  Inappropriate. 

Table  7.  Comparison  of  General  Factor  (CRP)  and  Second  Iteration 
Deletion  Statistics  for  Each  Specialty 


AFSC 

Number 

of  Raters 

*11 

_ *kk_ 

X  Deleted 

Factor 

REXALL 

Factor 

REXALL 

Factor  REXALL 

Factor 

REXALL 

404X0 

22 

34 

.22 

.14 

.86 

.85 

53 

28 

611X0 

93 

95 

.22 

.21 

.96 

.96 

23 

21 

326X0 

54 

54 

.54 

.54 

.99 

.99 

17 

17 

328X1 

74 

74 

.32 

.32 

.97 

.97 

11 

11 

672X2 

125 

127 

.38 

.37 

.99 

.99 

16 

15 

304X4 

276 

283 

.21 

.20 

.99 

.99 

16 

16 

Rote: 

*11 

and  Rm  are 

for  Number 

of  Raters 

surviving 

deletlom  l.e.. 

general 

factor  CRP  comprised  of  raters  with  loadings  >-.33  and  REXALL  results  for  raters  with 
correlations  >.30  after  two  deletion  passes. 
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Although  factor  analysis  was  Intandad  prlaartty  to  deal  with  tha  Idantlf Icatlon  of  Multiple 
rating  pollclas,  tha  Inforaatlon  convayad  by  tha  one-factor  solution,  togathar  with  tha 
factor/REXALL  analysas  comparisons,  paralts  Modification  of  tha  original  REXALL  CRP  extraction 
criteria  described  In  Section  I  of  this  report.  In  general  tarns,  these  findings  deaonstrate 
that  for  single-specialty  sanples,  tha  reliable  CRP  Is  derived  via  REXALL  analysis  when  a  level 
of  R|)  5-  .20  and  R^k  ^  .90  Is  attained  by  the  successive  deletion  of  sets  of  divergent  raters 

<  .10),  providing  Rp  Increases  with  each  deletion  pass  and  no  aore  than  25*  to  3 OX  of  the 
saaple  Is  deleted.  Allowing  for  the  deletion  of  this  aaxtaua  nuaber  of  divergent  raters  and 
taking  Into  account  the  Rji  stablltty/saaple  size  findings.  It  was  found  that  a  alnlaua  saaple 
size  of  SS  raters  was  required  to  attain  alnlaua  acceptable  Interrater  agreeaent.  For  sea  Her 
saaples  dictated  by  rater  availability,  Rf ]  >.20  and  Rkk  >  >00  would  be  acceptable. 

Rotated  factor  solutions.  The  VARINAX  rotation  redistributes  rater  variance  In  an  atteapt  to 
Isolate  the  nuaber  of  discrete  rating  policies  that  best  characterizes  the  data  In  a  Meaningful 
training  sense.  Theoretically,  a  principal  coaponents  analysis  requires  as  Many  factors  (rating 
policies)  as  there  are  variables  (raters).  The  analysis  produces  thea  In  order  of  decreasing 
proportions  of  total  variance  accounted  for.  However,  It  Is  obvious  that  the  nuaber  of  useful 
policies  aust  be  considerably  less  than  the  nuaber  of  raters.  The  factor-building  approach, 
whereby  an  Iteratively  Increasing  nuaber  of  factors  are  extracted  and  rotated,  stertlng  with  the 
two-factor  solution.  Is  based  on  the  belief  that.  If  significant  Multiple  rating  policies  with 
potential  training  application  exist,  they  should  be  represented  by  those  Initial  factors  which 
account  for  a  high  percentage  of  the  total  variance  ((H)  after  rotation.  Ideally,  these  factor 
rater  groups  would  (a)  display  Mutually  exclusive  aeabershlp,  (b)  account  for  aost  raters  (with 
loadings  greater  than  the  criterion  alnlaua  of  .33),  and  (c)  espouse  slgnlflcently  different 
rating  policies  (r^  <  .50).  More  specifically,  the  analysis  is  truncated  et  that  optlaal 
utility  point  beyond  which  factors  are  dropped  for  Interpretive  purposes  because  they  (a)  consist 
of  few  or  no  significant  loadings,  (b)  account  for  relatively  ssull  aaounts  of  variance,  (c) 
provide  no  further  gains  with  respect  to  Increasing  the  Mutual  exclusive  aeabershlp  of  prior  aaln 
factors,  and  (d)  deaonstrate  no  potential  training  application. 

Application  of  the  VARINAX  rotatlon/factor-bulldlng  technique  to  all  saaples  Identified 
different  rating  policies  (j^s  <  .50)  In  two  Instances:  the  coaplex  single  specialty,  AFSC 
404X0,  and  the  dual-specialty  saaple,  AFSC  328XX.  For  all  other  saaples,  the  rotated  solution 
analyses  reinforced  the  CRP  as  the  doalnant  rating  policy  by  Identifying  two  or.  three  aaln 
Internal  rating  theaes  as  alnor  variations  of  tha  CRP. 

The  three-factor  solution  for  AFSC  404X0  appeared  to  be  optlaal.  Factor  group  aeabershlp  was 
Mutually  exclusive  and  accounted  for  801  of  the  saaple.  Divergent  raters  who  were  not  accounted 
for  did  not  share  significant  variance  beyond  the  three-factor  solution.  Statistics  for  the 
single-  and  three-factor  solutions,  together  with  details  for  the  associated  rating  policies,  are 
provided  In  Table  8.  Pairwise  correlation  coefficients  (Spearaan's  jr()  aaong  the  three  factors 
(3F1,  3F2,  and  3F3)  were  low:  3F1/3F?  had  jr,  -  .103,  3FI/3F3  had  r^  -  .074,  and  3F2/3F3  had 
ir,  ■  .305.  These  values  Indicate  significant  high-priority  task/duty  differences  (see  Table 
8).  The  rater  policy  groups  were  Identified  by  the  predoalnant  duties  they  perforaed:  (a) 
photographic  processing  and  support  equlpaent,  (b)  caaera  and  audiovisual  Maintenance,  and  (c) 
caaera  Maintenance.  In  suaaary,  the  AFSC  404X0  saaple  Is  coaprtsed  of  three  discrete  and 
significantly  different  rating  policies,  one  of  which  duplicates  a  very  week  CRP.  Hhen  coablned, 
these  coapetlng  Multiple  policies  render  the  total  saaple  coaplex  and  unsuitable  for  REXALL 
analysis.  Details  of  the  three-factor  solution  for  AFSC  404X0  are  given  In  Table  9. 
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Table  8.  General  and  Rated  Factor  Statistics  for  AFSC  404X0 


No.  of  High- 

Factor  No.  of  *  Total  Priority  No.  of  High  Priority  Tasks  by  Duty 

Solution  Group  Raters  Variance  Rjj  Tasks  EFGHIJKLM 


Genera  1 


Factor 

CRP 

22 

17.6 

.22  .86 

3F1 

16 

16.8 

.32  .91 

Rotated 


Factors 

3F2 

13 

10.9 

.22 

.78 

3F3 

9 

9.7 

.03 

.23 

139 

11 

35 

56 

24 

0 

0 

0 

0 

13 

146 

11 

34 

64 

29 

0 

0 

0 

0 

10 

130 

9 

8 

9 

6 

41 

16 

15 

20 

6 

40 

7 

1 

0 

0 

20 

8 

4 

0 

0 

Notes:  Factor  group  eesibershlp  Is  deterilned  by  the  number  of  loadings  greater  than  or 
equal  to  the  criterion  minimum  of  .33.  Group  rating  policies  are  described  In  terns  of  duty 
emphases  associated  with  high  training  priority  tasks  Identified  by  the  FACPRT  program. 
High-priority  tasks  are  defined  as  those  tasks  with  a  naan  rating  greater  than  or  equal  to  one 
standard  deviation  above  the  mean  of  task  sieans.  The  frequency  distributions  of  rating  policy 
task  Means  revealed  that.  complementary  to  their  respective  high-priority  tasks,  GRP  3F1  and  GRP 
3F2  assign  zero-to-low  training  oephasls  to  approximately  SOS  of  all  tasks  whereas  GRP  3F3 
allocates  an  average  to  above-average  training  eephasls  to  95*  of  all  tasks. 


Table  9.  Rotated  Factor  Solution  for  AFSC  404X0 


Factor 

Group 

Number 

Raters 

R11 

Rkk 

Rating  Policy 

3F1 

16 

.32 

.91 

Photographic  Processing  and  Support  Equipment 

3F2 

13 

.33 

.78 

CaMra  and  Audiovisual  Maintenance 

3F3 

9 

.03 

.23 

Camera  Maintenance 

Details  for  the  optleal  three-factor  solution  for  the  dual-specialty  AFSC  326XX  saeple  are 
presented  In  Table  10.  The  two  naln  factor  groups,  3F1  and  3F2,  establish  two  uniquely  different 
specialty-specific  rating  policies  virtually  Identical  to  those  extracted  via  the  separate 
analysis  of  tha  two  component  specialties.  Group  3F3  consists  of  raters  who,  by  rating  across 
all  duties,  formulate  a  minor  CRP  for  the  total  sample.  The  mutual  exclusivity  of  factor  group 
Mmbershlp  and  the  low  rank-order  correlations  between  the  rating  policies  they  represent,  render 
the  total  sample  complex  and  unsuitable  for  REXALL  analysis.  The  r^  values  for  the  comparisons 
were  3F1/3F2,  r,  -  -.344}  3F1/3F3,  r^  -  -.088;  and  3F2/3F3,  -  .487. 

The  rotated  solutions  for  the  remaining  five  single-specialty  samples  share  common  features 
which  disqualify  the  component  factors  as  Manlngful  multiple  rating  policies.  Each  sample  Is 
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comprised  of  rating  policies  that  art  minor  variations  In  tha  CRP.  This  Is  avldancad  by  (a)  high 
Intar-pollcy  rank-ordar  corralatlons,  jr,  >  .SO,  (b)  rank-ordar  correlations  with  tha  CRP  In  tha 
range  of  .70  to  .99,  (c)  non-mutually  exclusive  membership,  (d)  high  training  priority  tasks 
which  are  largely  accounted  for  by  tha  CRP  high  training  priority  tasks,  and  (a)  rater 
memberships  which  ara  subsets  of  the  CRP  membership.  These  five  single  specialties  are 
appropriately  classified  as  staple  or  non-complex  In  that  the  REXALL  CRP  reliably  subsuMt  the 
competing  component  rating  policies. 

Table  10.  Rotated  Factor  Solution  for  AFSC  326XX 


Factor 

Group 

Number 

Raters 

R11 

R11 

Rating  Policy 

3F1 

54 

.56 

.99 

AFSC  320X0  CRP  (Incl.  one  320X1) 

3F2 

71 

.33 

.97 

AFSC  320X1  CRP  (Incl.  two  320X0) 

3F3 

16 

.20 

.06 

AFSC  320XX  CRP  (eleven  320X1  and  five  320X0) 

III.  APPLICATIONS 

1.  REXALL  analysis  Incorporating  the  new  CRP  extraction  criteria  Is  appropriate  for 
establishing  the  overall  recommended  training  priority  for  a  single-specialty  sample.  The  REXALL 
configuration  of  a  single-specialty  sample  likely  to  contain  a  reliable  CRP  Is  one  with  the 
following  characteristics: 

a.  Single-rater  reliability,  Rjj  >  .15. 

b.  Approximately  65*  (or  wore)  of  raters  with  correlations,  r  >  .40. 

c.  Some  rater  correlations,  r  >  .70. 

2.  REXALL  rater  correlation  guidelines  for  retaining  or  rejecting  raters  as  being  reliable 
or  divergent  with  respect  to  the  CRP  are  as  follows: 

a.  If  _r  >  .40,  reliable  rater;  retain. 

b.  If  .30  <  r  <  .40,  doubtful  rater;  analyze  rating  pattern  before  retaining  or 
rejecting. 

c.  If  r  <  .30  and/or  t-value  <  3.0,  divergent  reter;  reject. 

3.  Reting  pattern  analysis  to  support  the  retention  or  rejection  of  doubtful  raters 
consists  of  evaluating  the  extent  to  which  the  following  Individual  rater  characteristics  diverge 
from  the  mejorlty  reting  pattern: 

a.  Total  number  of  non-zero  responses. 


b.  Neen  rating  and  standard  deviation  on  the  1  to  9  scale 


c.  Distribution  of  non-zero  ratings  on  the  1  to  9  scale. 

d.  Distribution  of  non-zero  ratings  across  duty  areas. 

e.  Distribution  of  percentage  training  emphasis  across  duty  areas. 

These  rater  characteristics  are  available  from  the  CODAP  PRTDIS  (for  3a,  3b  and  3c)  and  DUVARS 
(for  3d  and  3e)  programs.  Rater  sequencing  can  be  In  normal  numeric  Input  order  or  KPATH  order. 
The  latter  sequence,  which  requires  additional  computing  via  the  CODAP  clustering  programs 
(OVERLAP,  GROUP,  and  KPATH),  separates  the  rater  saaple  Into  subgroups  of  raters  with  highly 
slallar  rating  patterns  and  Isolates  raters  with  diverging  rating  patterns. 

4.  Applications  of  these  criteria  and  guidelines  would  ensure  extraction  of  a  reliable  CRP 
(If  It  exists)  with  a  single-rater  reliability  Rjj  :>  .20.  The  Interrater  reliability  for  the 
final  set  of  CRP  raters  (R^)  will  depend  on  the  nuaber  of  good  raters  surviving  deletion.  To 
aaxlalze  attalnaent  of  Rkk  >.  .90,  a  alnlaua  safe  saaple  size  of  N  >  55  Is  desirable.  For 
smaller  samples,  an  Rkk  >.80  Is  acceptable. 

5.  Principal  components  factor  analysis  Is  appropriate  for  the  analysis  of  complex  single 
specialties  which  fall  to  attain  acceptable  Interrater  agreement  with  REXAI.L  analysis  using  the 
new  CRP  extraction  criteria  and  for  multi-ladder  survey  data  with  a  high  potential  for 
specialty-aligned  multiple  rating  policies.  The  number  and  type  (unidirectional  or  bipolar)  of 
significant  loadings  on  the  one  general  factor  solution  will  define  the  extent  to  which  a  CRP 
exists  for  a  sample.  Application  of  the  VARINAX  rotatlon/factor-bul Id  1  ng  analysis  technique  will 
determine  the  extent  to  which  competing  multiple  rating  policies  exist  within  the  sample. 

6.  In  seeking  a  multiple  factor  solution,  factor  extraction  and  rotation  should  be  stopped 
when  the  factors  Identified  are  found  to  satisfy  the  following  guidelines: 

a.  High  proportion  of  total  variance  accounted  for. 

b.  Host  raters  are  accounted  for  (loadings  >  .33)  while  remaining  divergent  raters 
(loadings  <  .33)  are  few  and  not  Included  within  the  main  factor  structure. 

c.  Results  remain  relatively  stable  upon  further  extraction. 

d.  The  policies  found  appear  reasonable,  with  potential  for  generating  coherent 
training  strategies. 

7.  The  veracity  of  a  rotated  solution  reflecting  intended  rater  training  recommendations  Is 
directly  proportional  to  the  level  of  single-rater  reliability  (Rj|)  within  each  policy  and  to 
the  extent  that  Interpretable  differentiation  exists  between  factor  policy/groups  In  terms  of  the 
following: 

a.  Mutually  exclusive  group  membership. 

b.  Rank-order  correlations  (r^  <  .50). 

c.  High  training  priority  tasks. 

d.  Common  background  variables. 
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IV.  CONCLUSIONS 


1.  Factor  analyses  of  the  six  single-specialty  training  emphasis  samples  In  this  report, 
although  uncovering  More  than  one  rating  policy  In  each  case,  have  demonstrated  them  to  be  less 
'complex'  than  anticipated.  For  five  of  these  specialties,  there  was  no  practical  difference 
(rs  .50)  between  the  rating  policies. 

2.  REXALL  analysis  employing  the  new  CRP  extraction  criteria  Is  adequate  for  CRP  Including 
all  raters  (Ideal)  and  for  CRP  with  divergency  less  than  25*  (e.g.,  AFSCs  326X0,  326X1,  811X0, 
672X2  and  304X4). 

3.  REXALL  analysis  Is  Inadequate  for  the  following  sample  types:  (a)  two  or  more  competing 
rating  policies  (e.g.,  AFSC  404X0),  (b)  no  main  policies,  and  (c)  multi-ladder  surveys  (e.g., 
AFSC  328XX) . 

4.  Modified  REXALL  analysis  and  COOAP  cluster  analysis  (normal  or  experimental  types)  are 
not  adequate  for  Identifying  multiple  rating  policies. 

5.  The  COOAP  auxiliary  summary  programs  (DUVARS,  PRTOIS,  PRTVAR,  and  FACPRT)  have  high 
utility  for  Interpretation  of  REXALL  and  factor  analyses. 

6.  Principal  components  factor  analysis  has  a  high  utility  for  Identifying  the  CRP  and 
multiple  rating  policies. 
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APPENDIX  A:  COOAP  CLUSTERING  DESCRIPTION 


The  naif  clustering  programs  are  INPSTD,  OVRLAP,  GROUP,  and  DIAGRM.  Initially  INPSTD  adjusts 
dath  rater's  task  ratings  10  to  9  scale)  to  a  percentage  of  the  sum  of  that  raters  training 
mphasis  ratings,  *TE.  This  adjustment  standardizes  all  raters  to  a  common  mean  of  100/NTASK. 
iHTASK  Is  the  total  number  of  tasks  In  the  Inventory.)  The  OVRLAP  program  estao.lshes  a 
rater-by-rater  similarity  matrix  using  percent  training  emphasis  In  common  (sum  of  linear  overlap 
n  corresponding  tasks)  is  the  measure  of  similarity.  This  matrix  Is  col’apsea  oy  tne  GROUP 
program  to  form  groups  of  raters  with  similar  rating  patterns.  Each  pair  of  raters  or  rater 

groups  which  merge  during  the  grouping  Is  given  a  contiguous  block  of  (KPATH)  sequence  numbers. 

The  hierarchical  relationship  between  raters/groups  can  be  graphically  displayed  via  the  DIAGRM 
ir<  gram.  A  valuable  CODAP  feature  Is  the  set  of  auxiliary  programs  that  can  be  utilized  to 

report  rater  and  group  data  summaries.  Raters'  training  emphases.  In  terms  of  number  of  tasks 

rated  (non-zero)  per  duty  category  and  percentage  of  training  emphasis  per  duty,  are  summarized 
In  the  DUVARS  program  printout.  Rating  patterns  are  summarized  In  the  PRTDIS  program  printout 
which  details  each  rater's  performance  on  the  1  to  9  scale  In  terms  of  total  number  of  tasks 
rated  and  mean,  standard  deviation  and  distribution  of  ratings.  These  summaries  are  especially 
relevant  to  group  structure  considerations  when  raters  are  listed  In  KPATH  sequence.  Analysis  of 
the  PRTVAR  program  output  allows  determination  of  the  extent  to  which  biographical  and  computed 
variables  are  shared  by  rater  groups.  for  any  selected  cluster  group,  the  JOBGRP  program 
computes  the  percent  training  emphasis  per  duty  summary  as  a  general  description  of  the  group 
rating  policy.  Task-level  differences  between  group  rating  policies  can  be  highlighted  by  the 
comparison  of  task  means  across  groups  using  the  FACPRT  program.  Rank-order  correlations  between 
group  task  mean  vectors,  using  the  FACCOR  program,  test  for  rating  policy  differences. 
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